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This paper draws attention to literature surrounding the subject of computer-assisted assessment 
(CAA). A brief overview of traditional methods of assessment is presented, highlighting areas of 
concern in existing techniques. CAA is then defined, and instances of its introduction in various 
educational spheres are identified, with the main focus of the paper concerning the implementa- 
tion of CAA. Through referenced articles, evidence is offered to inform practitioners, and direct 
further research into CAA from a technological and pedagogical perspective. This includes issues 
relating to interoperability of questions, security, test construction and testing higher cognitive 
skills. The paper concludes by suggesting that an institutional strategy for CAA coupled with staff 
development in test construction for a CAA environment can increase the chances of successful 
implementation. 


Introduction 

This paper presents evidence that the more traditional methods of assessment within 
universities have their limitations. As a result of these limitations and also the 
continued increase in the use of technology to deliver curriculum, the gap between 
assessment methods and learning is widening. 

Students entering higher education directly from schools and colleges are likely to 
have been exposed to Information Technology as part of the UK National Curricu- 
lum. Pilot studies conducted within schools for the delivery of summative assessment 
via the web (Ashton et al., 2003; Nugent, 2003) and for basic key sldlls tests in both 
Learn Direct and army centres (Sealey et al., 2003) indicate that CAA can success- 
fully assess students and provide timely feedback regarding class and individual 
progress. There is also empirical evidence to suggest students find CAA an acceptable 
assessment technique (Sambell et al., 1999; Croft et al., 2001; Ricketts & Wilks, 
2002a). Therefore it could be argued that for many students CAA may become a 
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more widely used method of assessment in schools, Further education and Universi- 
ties. Many universities are now using technology in their assessment strategies 
(Stephens & Mascia, 1997) and by examining the literature, lessons can be learned 
to facilitate the successful implementation of computer-assisted assessment. 


Methodology 

A comprehensive literature review was conducted primarily using online resources, 
although library resources were also used. The searching centred on the databases 
Ingenta, AACE, Science Direct and Conference Proceedings such as the International 
Computer Assisted Assessment Conference. Key word searching was problematic and 
time consuming, for example conducting a search using ‘computer assessment’ would 
produces a divergent array of articles in excess of one thousand. Other terminology 
used in the search included ‘computer based testing, computer-based assessment, 
computer aided assessment and e-assessment’ . Browsing through the contents of entire 
journals such as ‘Assessment and Evaluation in Higher education’ was also adopted. 


Assessment in general 

Academic assessment can be administered through various techniques. Fifty varied 
techniques have been identified and used within higher education for assessment 
purposes (Knight, 200 1 the most commonly used are exams and essays (Graham, 
2004). However this does not include all the methods now available within CAA 
packages for example incorporate questions that make use of multimedia. New 
assessment techniques will continue to emerge as technology and teaching methods 
change and develop, therefore continuing research will be required to determine the 
effectiveness and appropriateness of these methods. 

Each form of assessment presents its own difficulties, whether computer based or 
traditional. Essays present the problem of double marldng, in one study both markers 
agreed only 52% of the time (Powers et al., 2002). Additionally there are the prob- 
lems with cheating as Internet sites offer custom-written and off the shelf essays 
(Crisp, 2002). It has been suggested that exams tend to encourage surface learning 
(Race, 1995) and may cause increased anxiety resulting in significantly lower scores 
(Cassady & Johnson, 2002). The multiple choice question (MCQ) styles are used in 
both offline and CAA exams and raise a number of concerns, for example, grade 
deflation by not enabling partial credit (Baranchik & Cherkas, 2000), poorly designed 
questions (Paxton, 2000j Jafarpur, 2003) and guessing (Burton, 2001). However the 
advantages of using computers to deliver MCQ for lecturers include automated 
marking (Pollock et al., 2000) and for formative purposes the students have the 
opportunity to study at their own pace, repeat questions and receive instant feedback 
(Loewenberger & Bull, 2003). It is the potential advantages of CAA that has driven 
research into ways to overcome the difficulties. 

Ultimately in an academic environment the marks from summative assessment are 
accumulated to award an overall grade and there are concerns over comparability across 
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subject domains. It has been suggested that the scientific subjects produce more First 
Class Degrees than the humanities because of the nature of the marldng criteria in 
using the full range of marks and subjectivity is eliminated from the equation where 
there is a predefined correct answer (Yorke etal., 2002; Homey, 2003). These findings 
would appear to be further corroborated by the Higher education Statistics Agency 
(HESA) figures. Of the students graduating from UK universities in 2001/02 in Math- 
ematical Science 25.5% passed with a First Class Degree, compared to 10.4% in 
Humanities (HESA, 2002) and this trend was also evident in other years for example, 
1994/95 (HESA, 1995). CAA, like mathematics and some science subjects, also tends 
to use the full range of marks therefore the trend towards a high proportion of First 
Class Degrees may occur in other subject domains adopting this technique in the future . 

There is pressure on lecturers not to fail students, and one study found that in 
professional subjects there is a tendency to leave the award of a fail to the next assessor 
(Hawe, 2003). Lecturers are confronted with emotional and ethical dilemmas when 
a close working relationship is formed, increasing their reluctance to award a fail 
(Sabar, 2002). The emotional and subjectivity issues that are evident in human 
centred marldng may be removed via automatic marking offered by CAA software. 

It is important to recognize that some of these issues discussed are still prevalent in 
CAA along with new challenges. Adopting a diverse assessment strategy may lead to 
a fairer assessment of the student (Race, 1995). 

Computer-assisted assessment defined 

From the literature there is a lack of universal consent regarding the terminology and 
its definition, however. Bull and McKenna (2001) argue that computer-assisted 
assessment is the common term for the use of computers in the assessment of 
students and the other terminology tend to focus on the activities. Therefore the 
definition of CAA used in this review will be that: CAA encompasses the use of 
computers to deliver, mark or analyse assignments or exams. 

Variations in CAA 

Within higher education institutions the application of CAA has occurred in a 
number of varied ways, these include, adaptive testing (Latu & Chapman, 2002; Mills 
et al., 2002), analysis of the content of discussion boards (Macdonald & Twining, 
2002; Wiltfelt et al, 2002), automated essay marking (Christie, 1999; Burstein et al, 
2001), delivery of exam papers (Sim et al, 2003) and objective testing (Walker & 
Thompson, 2001; Pain & Le Heron, 2003). These methods vary considerably 
however the focus of this review of research will centre on the issues relating to 
implementing objective tests via CAA. 

Testing cognitive skills with CAA 

There is concern in the literature relating to CAA and its ability to test higher cognitive 
skills across subject domains (Daly & Waldron, 2002; Paterson, 2002). The higher 
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cognitive skills are often associated with ‘Analysis, Synthesis, and Evaluation’ as defined 
in Bloom’s Taxonomy (Bloom, 1956). However, a revised taxonomy takes into consid- 
eration the ‘Knowledge Dimension’ (Anderson & Krathwohl, 200 1 ) and this has also 
been used in CAA research for classification of questions (King & Duke-Williams, 
2002; Mayer, 2002). 

Paterson (2002) indicated that it is not feasible to test the higher-level cognitive 
skills using CAA within mathematics. Bloom states that in the majority of instances 
Synthesis and Evaluation promote divergent thinking and answers cannot be 
determined in advance (Bloom et al, 1971). Heinrich and Wang (2003) argue that 
objective testing is still not sophisticated enough to examine complex content and 
thinking patterns. However, other research in linguistics and computer programming 
concluded that the higher-level skills can be assessed via CAA through innovative 
approaches (Cox & Clark, 1998; Reid, 2002). In the study by Reid (2002) a new 
language was devised and students were required to apply linguistic techniques in 
order to answer MCQ. It has been suggested that CAA tests of higher-level skills are 
more complex and costly to produce (Dowsing, 1998) and this may be because more 
innovative approaches are needed. 


Question styles 

Objective testing has been used within assessment for over forty years (Wood, 1960) 
and computer programs delivering MCQ date back to the 1970s (Morgan, 1979). 
More sophisticated question styles have emerged enabling more diverse assessment 
methods. The question styles delivered by the TRIADS software developed at Derby 
University are evidence of this evolution, offering 17 question styles in 1999 
(Mackenzie, 1999) and 39 in 2003 (CIAD, 2003). However, staff at the University 
of Liverpool using TRIADS found that this presented an additional problem, as they 
were unfamiliar with the new question styles and lacked confidence in writing suitable 
questions (McLaughlin et al . , 2004) . Staff development in writing suitable questions 
and guidelines can be used to overcome these problems. For example, generic 
guidelines developed by Haladyna (1996), Herd and Clark (2002) present examples 
of the various questions styles used in further education whilst examples used within 
higher education can be found at http://www.caacentre.ac.uk. 

Although there are a large number of possible formats for CAA questions, it is 
possible to classify them into four distinct groups based on the human interaction 
technique required (CIAD, 2003). These groups are defined as point and click, move 
object, text entry and draw object. 


Point and click 

Point and click questions include Multiple Choice (MCQ) and Multiple Response 
(MRQ) items, which have both been used within assessment practise for a consider- 
able time and as a result are often transformed into CAA (Ricketts & Wilks, 2002b) . 
Ebel (1972) suggests that any understanding or ability that can be tested by means of 
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any other technique, for instance essays, can also be tested by MCQ. More complex 
MCQ questions can be devised through assertion reasoning resulting in the testing of 
higher cognitive sldlls (Bull & McKenna, 2001). Both MCQ and MRQ have inherent 
problems, such as reliance on true and false style questions which students might 
perceive to be unfair (Wood, 1960). Davies also argues that the quality of MCQ is 
dependent on the quality of the distracter and not the question (Davies, 2002). 


Move object 

Move object style questions focus on the movement of objects to predetermined 
positions on the screen. They are a variation of the MCQ format and are good for 
assessing students understanding of relationships (Bull & McKenna, 2001). For 
example in computing they could be used for the labelling of entity relationship 
diagrams or in linguistics students could be presented with a poem and move the 
highlighted words to the appropriate word class. One problem is that when the 
number of moveable objects is equal to the number of targets, if a student Imows all 
but one answer they will automatically get full marks (Wood, 1960). 


Text entry 

Text entry questions consist of input of short predefined answers, such as factual 
knowledge or syntax in computer programming. An advantage of this format is that 
students must supply the correct answer removing the possibility of guessing (Bull & 
McKenna, 2001) and this style has been found to be the most demanding format for 
students (Reid, 2002). There are problems associated with text entry within some 
subject domains such as mathematics, as mathematical expressions cannot easily be 
included in most commercial software (Croft et al., 2001; Paterson, 2002). Another 
problem associated with this question style is that the answer may be marked 
incorrect due to spelling mistakes and the time saving element may be reduced if 
lecturers need to manually check for spelling errors. 


Draw object 

This is associated with drawing simple objects or lines. For example, students may be 
required to plot graphs which can be automatically marked. This style of question is 
a high discriminator between strong and weak candidates (Mackenzie, 1999). There 
is little evidence in the literature concerning the effectiveness of this format, but this 
might be due to the fact that commercial software such as Questionmark and I-Assess 
do not have this style in their templates. 


Interoperability and question banks 

Question banks which are authored and peer reviewed by academics are emerging, 
such as the Electrical and Electronic Engineering Assessment Network who 
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developed a database of questions in electrical and electronic engineering (Bull et al. , 
2002). One such bank will typically require 5000 questions maldng it unfeasible for a 
single institution to develop (Maughan et al., 2001). Constructing high quality 
questions is difficult, time consuming and expensive (Sclater et al., 2003) and issues 
arise in the interoperability of questions between CAA Software (Lay & Sclater, 
2001). There are several international standards established to enable interoperability 
of questions between software applications (Herd & Clark, 2003). These specifica- 
tions are based on metadata structure for questions and their grouping together. 
Unless these interoperability standards are developed and utilized question banks will 
have a limited life, as they cannot be used on a variety of delivery platforms (White & 
Davis, 2000). Systems are emerging that are IMS-QTI compliant (Instructional 
Management Systems - Question and Test Interoperability Specification) to facilitate 
the exchange of questions (Daly, 2002j Bacon, 2003). The Centre for Educational 
Technology Interoperability Standards (www.cetis.ac.uk) offers comprehensive 
resources and information on the issues concerning interoperability which may help 
direct further research. 


Guessing 

A number of the question styles associated with CAA can lead to artificially high 
marks through guessing (Bush, 1999), which has implications for setting the pass 
mark of the test. For example, setting a pass mark of 40% based on assessment of 
true/false answers would be inappropriate, as guessing alone would give an average of 
50% (Harper, 2002). The problems of guessing may be addressed through various 
marking schemes, such as post test correction (Bull & McKenna, 2001), negative 
marking (Bush, 1999), increasing the number of questions or combining the results 
from several tests (Burton & Miller, 1999) or increasing the number of distracters and 
the pass mark (Mackenzie & O’Hare, 2002). It has been suggested that negative 
marking is not generally implemented in the UK (McAlpine, 2002) and that post test 
correction is only suitable with a single question style because the formulae would 
vary depending on the number of distracters (Harper, 2003) . 

Statistical analysis has resulted in various methods being developed to assist in test 
construction in order to reduce the effects of guessing. An empirical marking 
simulator to assist in scoring and test construction based on a base level guess factor 
has been developed (Mackenzie & O’Hare, 2002), this program examines the mark 
distribution and measurement scale for a set of random answers, enabling tutors to 
establish the effects of guessing on their assessment. Also statistics to award a score 
for partial credit through a formula based on a mean uneducated guessers score has 
been investigated (McCabe & Barrett, 2003) . This allows MCQ to be unconstrained, 
similar to MRQ styles, enabling students to provide more than one answer and their 
score is weighted depending on the number of choices. For example, an MCQ with 
one correct answer, four possible options and a score of 3, if a student includes the 
correct answer by selecting 2 options they would only score 2 (2=3-1). Davies used a 
combination of predetermining the students’ confidence in answering the question 
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prior to seeing the distracters and negative marking, resulting in students perceiving 
this to be a fairer test of their abilities (Davies, 2002) . 

There is lack of evidence that any one specific technique generates more accurate 
results than any other. It could be argued that these techniques are unnecessary if the 
tests are well constructed (Bull & McKenna, 2001). 


Accessibility 

UK institutions now have to comply with the Special Educational Needs and 
Disability Act when preparing both teaching and assessment material (SENDA, 
2001). The number of students in UK higher education registering a disability in 
2000 was 22,290 and this has implications for CAA (Phipps & McCarthy, 2001). For 
example, a student with dyslexia may exert more cognitive resources in interpreting 
the question, therefore, ensuring the language is appropriate is a necessity (Wiles & 
Ball, 2003) . In addition extra time may be required to complete the test which may 
necessitate the publishing of two different assessments, one with a longer duration. 
Feedback from one dyslexic student regarding CAA indicated that they thought it 
provided a more level playing field in which they can demonstrate their knowledge 
Qefferies et al., 2000). Students with visual or physical impairment may struggle to 
answer move object and draw object style questions without the aid of assistive 
technology, they may need specially adapted input software and hardware such as, 
touch screens, eyegaze systems or speech browsers. 

There are guidelines for general teaching, however there is little evidence that 
guidelines for inclusive and accessible design in CAA are emerging (Wiles, 2002). For 
example, when multimedia elements, such as video are used within the assessment, 
it may necessitate the provision of an alternative paper-based version for students 
with sensory impairment. The introduction of an alternative, in this instance paper, 
poses the problem of ensuring comparability (Bennett et al., 1999). When identical 
tests are presented on a computer and paper they are not comparable (Clariana & 
Wallace, 2002) because there are numerous variables that impact on student’s 
performance when questions are presented on a computer. These variables include 
the monitor (Schenkman et al., 1999), the way text is displayed on screen (Dyson and 
Kipping, 1997), reading from a monitor is slower than paper (Mayes et al., 2001) and 
the problems of obtaining a feel for the exam when only a single question is presented 
(Liu et al., 2001). The Web Accessibility Initiative (http://www.w3c.org/WAI/) has 
produced useful guidelines for promoting online accessibility which may be 
applicable to CAA but this initiative does not address the issue of comparability 
between questions. 


Institutional strategies for the adoption of CAA 

The greatest barrier to the adoption of CAA by academics is lack of time, to both 
develop questions and learn the software (Warburton & Conole, 2003). This may 
have contributed to the fact that the adoption of CAA has usually resulted from the 
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impetus of enthusiastic individuals rather than strategic decisions (O’Leary & Cook, 
2001; Daly & Waldron, 2002). The perceived benefits of CAA of freeing lecturers’ 
time can be illusive if no institutional strategy or support is offered (Stephens, 1994), 
successful implementation may be left to chance (Stephens et al., 1998) and CAA 
may be developed in an anarchic fashion (McKenna & Bull, 2000). Research 
conducted at the University of Portsmouth indicate that there is no time saving 
benefit for courses with less than twenty students (Callear & King, 1997). In order to 
utilize the features within software packages staff training and development is 
necessary (Boyle & O’Hare, 2003) and this may not be feasible without institutional 
support. 

Institutions adopting CAA are faced with the difficulty of evaluating and deciding 
upon the most appropriate CAA software. Without an institutional strategy, 
individual departments may adopt their own systems (O’Leary & Cook, 2001). This 
results in students having to cope with a number of different user interfaces and CAA 
formats, increased licence costs and problems offering administrative and technical 
support. Even if an institution has a clear strategy there are also problems in 
determining the selection criteria for software used to deliver assessment and there is 
a lack of analysis within the literature (Valenti et al., 2002). Sclater and Howie (2003) 
contributed to this literature by defining the ultimate online assessment engine. This 
was achieved through a process of examining the user requirements of the system, 
establishing the stakeholders and their functional requirements. This research may 
aid institutions identify their needs and establish an appropriate evaluation 
methodology. 

The following guidelines for an institutional strategy have been formulated by 
Loughborough University and the University of Luton: establish a coordinated 
CAA management policy for CAA unit(s) and each discipline on campus; establish 
a CAA unit; establish CAA discipline groups/committees; provide funding; organize 
staff development programmes; establish evaluation procedures; identify technical 
issues; establish operational and administrative procedures (Stephens et al., 1998). 
BS7988 is a new British Standard Code of practice that has been introduced 
governing the use of information technology in the delivery of assessments 
(BS7988, 2002). The guidelines have various implications for the delivery of assess- 
ments, for example, it is recommended that students take a break after 1.5 hours 
which has an impact on the invigilation process. If this recommendation is followed, 
procedures need to be established to prevent collusion between students during the 
break or the tests need to be split into two separate sections. One of the difficulties 
for many institutions using CAA arises through the lack of resources to accommo- 
date large cohorts of students sitting the exam simultaneously (Mackenzie et al., 
2004). This problem can be alleviated through institutional support and therefore, 
to fully utilize the benefits of CAA an institutional strategy would appear necessary 
to increase the chance of successful implementation. These benefits are evident 
within a number of institutions with strategies, such as, Ulster (Stevenson et al., 
2002), Derby (Mackenzie et al., 2002), Coventry (Lloyd et al., 1996) and 
Loughborough (Croft et al., 2001). 
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Security 

The move from traditional teaching environments and examination settings presents 
additional issues relating to security. Frohlich (2000) states that in traditional 
environments it is possible to ensure the security of the exam papers and scripts, this 
includes the transportation to and from the exam venue. However, even under this 
system breaches in security do occur, for example AQA had to replace 500,000 
English and English Literature exam papers after a box had been tampered with 
(Curtis, 2003). 

Tannenbaum (1999) defines security in computer systems as consisting of 
procedures to ensure that individuals cannot access material for which they do not 
have authorisation. This is essential within a CAA environment as questions and 
student details are stored in a database and usually the test data is sent over a local 
network or the Internet. Before computers were connected to the Internet it was rela- 
tively easy to have effective security measures (Mason, 2003), but transmission of 
sensitive data over an insecure network requires additional security measure to be 
implemented. 

Encryption techniques can be used to ensure the security of the questions and 
answers when transmitting data over the Internet (Sim et al., 2003). To increase 
security, examinations can be loaded on to the server at the last minute (Whittington, 
1999). If email is used to submit results there is a potential risk due to the lack of 
authentication (Hatton et al., 2002). Four security requirements have been identified 
by Luck and Joy, these being: all submissions must be logged, it must be verified that 
a stored document used for the assessment is the same as the one used by the student, 
a feedback mechanism must inform students that their submission has been received 
and the identity of the student must be established (Luck & Joy, 1999). 

With the majority of CAA software students and administrators are required to 
have passwords which is often the weakest link in terms of protection (Hindle, 2003). 
Although an unlikely event, students could get access to the administrator password 
and change their results or gain access to the questions. Other concerns include 
authentication and invigilation of the students, which can be are particularly 
problematic in remote locations (Thomas et al., 2002). At present students enrolled 
on distance learning courses overseas need to sit exams in a specific location such as, 
the British Council Offices to enable authentication and invigilation. Research is 
being conducted to overcome these problems but unless solutions are found, 
geographical barriers will remain as students need access to the test centres. 

During the test computers need to be locked down, removing the possibility of 
accessing other content and secure browsers have been developed to enable this such 
as, Questionmark Secure (Kleeman & Osborne, 2002). There are operational risks 
associated with CAA that have security implications such as the server crashing and 
these risks need to be identified and procedures established to minimize them 
(Zakrzewski & Steven, 2003) . 

There are software standards for security for example, the British Standards on 
Information Security Management BS7799, which has also been adopted as an 
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International Standard IS17799. In addition, when data from the test has been 
collected institutions within the UK should abide by the Data Protection Act 1998 
(Mason, 2003). If security measures are in place there is no evidence to suggest that 
the integrity of the examination is more compromised by delivery over the Internet 
than by paper. 


Conclusion 

The implementation of CAA from a technical and pedagogical perspective is a 
complex process. The first, and perhaps the most important, lesson that can be 
learned is that an institutional strategy would seem to greatly increase the chances of 
success. There are recommendations that have been made to assist policy makers 
formulate an effective strategy. Without institutional support implementing security 
procedures may be more problematic, such as locldng down PCs. However, 
authentication and invigilation in remote locations is still an issue that has yet to be 
fully resolved. 

The other important lesson that can be learned is in relation to staff development 
and training in test construction within a CAA environment. Focused staff develop- 
ment may help alleviate a number of issues, such as guessing, testing various cognitive 
skills, using appropriate question styles and accessibility. The emergence of question 
banks may also address these issues depending on their level of interoperability. 
Another issue is that whilst there are guidelines relating to accessible online content 
there are still no formal guidelines relating to CAA. 

The reliance on a single method of assessment is problematic and a diverse 
assessment strategy is usually necessary. Within an environment of increasing student 
numbers and a reduction of staff to student ratio, CAA would appear to be a partial 
solution. This study has highlighted the issues surrounding the implementation of 
CAA to both inform and direct further research in the field. 
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